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Abstract 

We extend the Eta weather model from a regional domain into a belt domain that does not 
require meridional boundary conditions. We describe how the extension is achieved and the parallel 
implementation of the code on the Cray T3E and the SGI Origin 2000. We validate the forecast 
results on the two platforms and examine how the removal of the meridional boundary conditions 
affects these forecasts. In addition, using several domains of different sizes and resolutions, we 
present the scaling performance of the code on both systems. 
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Abstract 

We extend the Eta weather model from a regional domain into a belt domain that does not 
require meridional boundary conditions. We describe how the extension is achieved and the para e 
implementation of the code on the Cray T3E and the SGI Origin 2000. We validate the forecast 
results on the two platforms and examine how the removal of the meridional boundary conditions 
affects these forecasts. In addition, using several domains of different sizes and resolutions, we 
present the scaling performance of the code on both systems. 


1 Introduction 

By the end of the year 2002, NASA will launch the satellite Triana. It will be the first Earth observing 
mission to provide a continuous, full disk view of the sunlit Earth. Two of the instruments that 1 nana 
will carry are EPIC, which will deliver science products such as total precipitable water, cloud eig , 
aerosol index, total ozone, and a global visible cloud field image, and NISTAR, which obtains precise 
radiometry integrated over the entire sunlit disk. This unique set of observations has tremendous 
potential to aid in our understanding of the total Earth system and the effects of natural and human- 

induced changes in the global environment. , , c i-i| 

As part of the HPCC Program at NASA GSFC, we have started a project (called the SunFlower 
Project) whose goal is to simulate some of the Triana observations and to assess the impact of Triana 
data for weather and climate predictions. Using the near-continuous cloud parameters observed by 
Triana and numerical simulation, we intend to produce a realistic climatology of full three-dimensiona 

daily global cloud coverage. r , o1 T 

For the simulation of the atmosphere within this project we are using the Eta model |4, 4 In 
order to compare Triana and the Eta model data on approximately the same grid without significant 
downscaling, the Eta model will be integrated at a resolution of about 15 km. The mtegration domain 
(from -70 to +70 deg in latitude and 150 deg in longitude) will cover most of the sunlit Earth disc and 
will continuously rotate around the globe following Triana. The cloud data assimilation is mten e o 
run and produce 3D clouds on a near real-time basis. The moving domain will get its lateral boundary 
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conditions from a a lower resolution belt domain. Such a numerical setup and integration design is 
very ambitious and computationally demanding in terms of memory management, efficiency of the 
code and accuracy of the forecast produced. 

The Eta model was originally designed for regional integration domains. Rancic el al. were able 
to expand the regional integration domain into a belt-like domain [6]. This belt-like domain was still 
seen as just a large regional domain in the sense that it used meridional boundary conditions (no 
periodicity). With such a choice, although producing acceptable forecasts [6], we could not meet the 
requirements of the SunFlower Project. The National Centers for Environmental Prediction (NCEP) 
subsequently released a new version of the Eta model code. We modify this code for belt domain 
(supposed to provide lateral boundary conditions for the moving domain in the SunFlower project) 
integrations. This new code, called the Eta-belt, no longer requires meridional boundary conditions, 
rather it incorporates periodicity. The objective of this paper is to test the efficiency of the Eta-belt 
in terms of both parallel performance on the Cray T3E and the SGI Origin 2000 and forecast results. 

An outline of this paper is as follows. Section 2 gives a general description of the Eta weather 
model. Section 3 explains the strategy used to extend this model from a regional domain into a belt 
domain. In Section 4, we provide an overview of the two platforms. Numerical experiments appear in 
Section 5. We formulate some remarks and conclusions in Section 6. 


2 The Eta Model 

The Eta model [4] is a limited-area atmospheric model that serves as the major regional model at 
NCEP. It is a primitive-equations model based on finite differencing for the computation of atmospheric 
dynamics and physics under hydrostatic assumptions. The model employs the concept of u step- 
mountain” vertical coordinates and uses a semi-staggered horizontal distribution of variables, known 
as the Arakawa E-grid. Two major principles built into the model’s design are: maintaining integral 
constraints of the continuous equations within finite-differencing approximations, and minimizing, or 
completely avoiding, artificial filtering of short waves. The model also utilizes a variety of sophisticated 
physical parameterization schemes. The Eta model is used for real-time forecasting by many groups at 
institutions worldwide, and it has shown remarkable skill in forecasting precipitation scores, as well as 
in the development and movement of severe storms [4, 5]. For general information on the Eta mode , 

refer to the following web site: http://www.srh.noaa.gov/ftproot/ssd/NWPMODEL/HTML/eta.htm. 

The original version of the Eta model was designed and optimized for efficiency on vector based 
architectures. Computational demands for very fine resolution and large problem domains motivated 
the development of a distributed memory parallel option for the Eta model. The Eta model is par- 
allelized using a standard two-dimensional data domain decomposition. Two- and three-dimensional 
data arrays containing prognostic variables (wind velocity, temperature, moisture, and pressure), as 
well as diagnostic and intermediate fields, are horizontally partitioned, and the resulting subdomains 
are distributed over the available processors (see Figure 1). Computations on the horizontal mesh use 
explicit time differencing. 

The parallel code for the Eta model uses two types of communication: local and global. Local 
communications, where the data are exchanged only by neighboring processors, are typical for explicit 
time-differencing. Global communications involve the computations done by the master processor, 
which are then distributed among all processors, and are mainly used for I/O procedures within t e 
Eta model code. 

The parallel Eta code is written in Fortran 90 and uses MPI for interprocessor communications. 
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Figure 1: Domain decomposition and mapping: the iV x M grid points are mapped into p processors that are 
decomposed into a q x * array. Each processor gets N/q x qM/p grid points. 


3 Design of the Wrap-Around Belt-Domain 

In the above parallel implementation of the Eta model, it is assumed that the integration domain is a 
regional domain requiring zonal and meridional boundary conditions. Our objective is to extend the 
regional domain into a belt one where there are no meridional boundary conditions. 

Rancic et ai first implemented a belt-like domain with the Eta code. This domain was a large 
regional one still having meridional boundary conditions. However, their implementation kept all 
the properties of the Eta model. In particular, their numerical experiments showed that their belt 
model generally produces better skill than the regional model and more significant improvement with 
the increase of resolution [6]. With the release of a new version of the Eta code (having improved 
physics, more vertical levels, etc.), we initiated an effort to design a fully periodic belt model. The 
idea was to modify the model integration code (on parallel computers), as well as the preprocessing 
and postprocessing procedures (on workstations), by stretching the regional domain in the left and 
right directions so that the left and right boundaries overlap. 


3.1 Preprocessing 

The preprocessing system converts global analysis data from NCEP into initial and boundary condi- 
tions for the Eta model. This system had to be adapted to provide the periodic initial and boundary 
conditions needed by an equatorial belt domain rather than the isolated regional domain for which it 
was designed. Rather than explicitly enforce periodicity throughout this lengthy code, the 360 degree 
belt region is extended by several degrees at either end, and these end regions are correctly loaded 
with input data based on the periodic requirement. With these end regions included, most of the 
preprocessing code is able to continue to regard the domain as an isolated region centered about the 
equator and prime meridian, and correct interpolation is obtained right up to the dateline without 
any explicit knowledge of the periodic boundary condition at this location. At output, the extended 
end regions are discarded so that a strictly periodic data set over 360 degrees is supplied to the model. 
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3.2 Model Integration 


The idea here is to still work with a 360° domain and to incorporate the periodicity by modifying 
the code so that grid points along the right and left meridional boundary become neighbors (instead 
of isolated points). We used the regional domain version of the code and introduced changes in the 
domain decomposition so that each processor has right and left neighbors unlike the case with the 
original belt-like domain. In that case, the left-most processors marked one meridional boundary, the 
right-most set marked the other. In this new case, processors having portions of the domain on the left 
side communicate with those assigned to the right side of the domain. Figure 2 gives a representation 
of this decomposition strategy. In addition, some subroutines were modified (for example the one 
performing horizontal advection) and arrays were redimensioned to reflect this change. 

Compared to the regional domain code, the new code performs a little more interprocessor 
communication but the same amount of computation. 
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Figure 2: Domain decomposition for the belt domain. 


3.3 Postprocessing 

The postprocessing system converts output data from the Eta model into a form suitable for use with 
our graphical data package. This system has two stages — one to interpolate from Eta coordinate levels 
to standard pressure levels, and another to interpolate from the Arakawa E-grid to a regular cartesian 
grid. For both stages, the original isolated regional version has been modified in order to correctly 
perform the horizontal smoothing and interpolation procedures near the lateral periodic boundaries. 


4 Description of the Platforms 

For the set of experiments presented here, we employed two architectures: the Cray T3E and the SGI 
Origin 2000 (SGI 02K). 

Cray T3E: 

The Cray T3E is a massively parallel processor system which consists of 32 to 2048 Processors Elements 
(PE). Each PE is a 300 MHZ DEC Alpha 21164 microprocessor capable of 600 million floating point 
operations per second and has a DRAM memory of 64 megabytes to 2 GB. We employed a 256 PE 
Cray T3E configuration located at the NASA Center for Computational Sciences at NASA Goddard 
Space Flight Center. 
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Domain 1 

Domain 2 

Domain 3 

Coverage 
Resolution 
Grid points 
Forecast length 

50S-50N 
1/2 deg 
720 x 201 x 38 
48 hours 

70S-70N 
1/2 deg 
720 x 281 x 38 
48 hours 

50S-50N 
1/3 deg 

1080 x 301 x 38 
48 hours 


Table 1: Description of the domains. 


SGI Origin 2000 

We used the 512-processor SGI Origin 2000 available at the Numerical Aerospace Simulation high 
performance facility at NASA Ames Research Center. It is currently the largest single-image system 
in existence, with one operating system and a single address space. It has 192 GB main memory, 
a 2-TB FC Raid Disk Subsystem, and 327 GB disk storage. Each processor is a 400 MHZ R12000 
Processor. 

Our implementation of the belt domain was first tested on the Cray T3E and then ported to the 
SGI 02K. The only major changes made to port the code to the SGI 02K were modifications to some 
system calls specific to each platform. 

The Eta code requires double precision computations. On the Cray T3E, by default, each variable 
is declared in double precision. However, the default on the SGI 02K is single precision. To solve this 
problem, we introduced the compilation option (-r8) that sets all real variables to double precision 
and we also modified some arguments in specific MPI calls. We use general optimization levels: -03 
on the Cray T3E and -02 on the SGI 02K. 

5 Numerical Experiments 

In this section, we report the results of our experiments by briefly validating our forecasts and by 
presenting the scaling performance of our code on both the Cray T3E and the SGI Origin 2000 (SGI 
02K). For this study, we consider three domains, which were preprocessed on an SGI workstation. 

Domain 1 and Domain 3 cover the region extending from 50°S to 50°N at 1/2 deg and 1/3 deg 
resolution respectively. Domain 2 is of 1/2 deg resolution and covers 70 £> S-70°N (see Table 1). All 
three domains have the same number of vertical levels (38). It is important to note that Domain 3 is 
2.25 and 1.6 times as large as Domain 1 and Domain 2 respectively. For all the domains, the initial 
conditions were derived from the state of the atmosphere on June 10, 1999 as given by NCEP’s global 
analysis data. 

We analyzed the forecast results obtained from both the Cray T3E and the SGI 02K by com- 
puting their root-mean-square differences. The differences were significantly small, and the forecasts 
were “identical” on the two systems. 

We now briefly examine the forecasts produced by the Eta-belt. In Figure 3 we present the sea 
level contour plot for Domain 2 and in Figure 4 the geopotential height at 500 mb for Domain 3, 
each after a 48-hour forecast period. The figures look reasonable, and the runs did not generate any 
spurious boundary noise. Similar results were achieved with experiments using prescribed boundary 
conditions for up to 15 days [1]. In addition, note that the contour lines at the left and right sides 
exactly match. This shows that the periodicity of our Eta-belt was properly implemented. 
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Figure 3: Domain 2: sea level pressure after a 48 hour forecast. 


As it was presented in [6], we can also show that 

• The forecast skill of the Eta-belt consistently improves with the increase of the horizontal reso- 
lution. 

• Compared to the regional model, the Eta-belt generally has better skill and more significant 
improvement when the resolution is refined. 



Figure 4: Domain 3: geopotential height at 500mb after a 48 hour forecast. 



SGI 02K 

Cray T3E 

CPUs 

Time 

Speedup 

Time 

Speedup 

64 

1834 

64.00 

6426 

64.00 

128 

1060 

110.1 

3612 

113.8 

256 

909 

129.1 

2094 

196.4 


Table 2: Domain 1: elapsed times and speedup as function of the number of processors. 


We introduce the parallel performance of the Eta-belt by recording for the three problems the 
elapsed time and the speedup when the number of processors (from 64 to 250) varies. The results are 
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reported in Table 2, Table 3 and Table 4. 

Our first remark is that when we increase the number of processors, the elapsed times decrease. 
The use of a belt domain does not deteriorate the parallel performance. This is consistent with results 
obtained with regional domains [3]. Another interesting remark is that the Eta-belt code is bister on 
the SGI 02K. This is due to the fact that the SGI 02K uses faster processors and carry out more 
efficiently interprocessor communication and I/O. 

On both systems, the efficiency of the code improves as the size of the problem increases (from 
Problem 1 to Problem 3). Though requiring more time to complete the runs, the Cray T3E displays 
the best scalability. If we were to make some extrapolations, we could claim that the Eta-belt would 
scale beyond 512 processors on the Cray T3E. This will not necessarily be the case for the SGI 02K. 
In fact, we integrated Problem 3 on the SGI 02K with 320 processors and we found out that the 
elapsed time was about the same as the one achieved with 256 processors. Finally, because of the 
amount of memory available on the SGI 02K, it is possible to integrate large problems with fewer 
number of processors. 



SGI 02K 

Cray T3E 

CPUs 

Time 

Speedup 

Time 

Speedup 

64 

2516 

— 

N/A 

— • 

128 

1433 

128.0 

4917 

128.0 

256 

1096 

167.3 

2742 

229.5 


Table 3: Domain 2: elapsed times and speedup as function of the number of processors. 



SGI 02K 

Cray T3E 

CPUs 

Time 

Speedup 

Time 

Speedup 

128 

2226 

— 

N/A 

— 

224 

1572 

224.0 

4609 

224.0 

240 

1511 

233.0 

4257 

242.5 

256 

1508 

233.5 

4166 

247.8 


Table 4: Domain 3: elapsed times and speedup as function of the number of processors. 


As for the SunFlower project (described in Section 1), we plan to use the Eta-belt with a domain 
extending from 70 deg south to 70 deg north at 1/3 deg resolution (about 1080 x 421 x 38 grid 
points). This belt domain will provide lateral boundary conditions to a regional domain continuously 
moving around the globe following the satellite Triana. The regional domain will extend from 70 deg 
south to 70 deg north and will cover a window of 150 deg in longitude at 1/6 deg resolution (about 
900 x 841 x 38 grid points). The belt and the regional domains will be integrated simultaneously in a 
two-way interaction mode. Such a demanding numerical setup can only be achieved on the SGI 02K 
(with 512 processors) because it offers a larger amount of memory and faster processors. However, we 
envision to use the Cray T3E for development and testing but the SGI 02K for the final Sunflower 
project experiments. 
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6 Conclusions 

We have expanded the Eta atmospheric, model from a regional domain model into a belt domain Our 
1 L experiments, carried out on the Cray T3E and the SGI Origin 2000 have shown tha the 
new code (on a belt domain) keeps the same forecast skill (or even does better) as the original one 
(on a regional domain). In addition, our forecast results were “identical” on the two platforms. The 
new code runs faster on the SGI Origin 2000, but it scales better on the Cray T3E. 
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